Targeted Gene Metagenomic Data Analysis ◾ 265
generates another file format, with the “.qzv” file extension, called visualization file. This
visualization file is a standalone and sharable file that may contain any kinds of output such
as images, tables, and interactive representations. Plugin methods take QIIME2 artifacts
as input and produce an output. While a plugin visualizer produces a single visualiza-
tion file for the purpose of visualizing or sharing. In the following, we will show you how
to use QIIME2 to analyze targeted gene metagenomic data. The general workflow of the
amplicon-based metagenomic data analysis is shown in Figure 7.1.
7.3.1 QIIME2 Input Files
Metagenomic amplicon-based raw data may be acquired from a study conducted at the
laboratory or may be downloaded from a database. A study will have a design tailored to
address research objectives. The study design usually dictates the workflow of the analysis.
Whether raw data is from your own project or downloaded from a database, it usually
includes (i) raw sequence data and (ii) metadata (information of the samples and study
design). The analysis with QIIME2 requires importing these two inputs and converting
them into artifacts. Then, only the artifacts are the files that are used for the analysis.
Therefore, the first task is to import the raw sequence data and metadata into artifacts. The
QIIME2 artifacts have their semantic type that enables QIIME2 to identify the suitable
artifact for an analysis. In the following, we will discuss the raw sequence and metadata
with more details.
7.3.1.1 Importing Sequence Data
QIIME2 accepts inputs in a variety of file formats including FASTA, FASTQ files, and
feature files of OTUs or representative sequences. The FASTQ (single-end or paired-
end) files are the most commonly used. The reads in FASTQ files may be multiplexed or
demultiplexed. If they are multiplexed, they can either be multiplexed following the Earth
Microbiome Project (EMP) protocol (the barcode sequences are in a separate file) or non-
EMP (the reads are with in-sequence barcodes) [16]. On the other hand, the demultiplexed
reads can either be in Casava 1.8 format or not.
Some laboratories may have their own sequencer and others may depend on genomic
core facilities for sequencing. In either case, the raw data would be provided in one of the
above formats. Raw sequence data can also be downloaded from metagenomics databases.
Examples of metagenomic databases include NCBI SRA database available at “https://www.
ncbi.nlm.nih.gov/sra” and the EMBL-EBI-hosted MGnify database available at “https://
www.ebi.ac.uk/metagenomics/”. Both databases provide data generated from a variety of
microbiome studies on specific environment such as human body sites, soil, seawater, and
others. MGnify microbiome data can also be accessed from the NCBI SRA. Sequence data
like FASTQ files generated from those studies are stored in the NCBI SRA as sequence read
archives, which are compressed files and can be downloaded using SRA-toolkit.
Whether the input file for QIIME2 is FASTA file, FASTQ files, or feature file, it must be
imported by QIIME and converted into QIIME2 artifact. Importing an input file into an
artifact depends on the raw data file format; each file format is imported in a unique way.
But, in general, to import any input data, you must use “qiime tools import”. We already